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LARGE-SCALE  LINEAR  PROGRAMMING 


by  George  B.  Dantzig* 

Large-Scale  Systems  and  the  Computer  Revolution: 

i 

From  its  very  inception,  it  was  envisioned  that  linear 
programming  would  be  applied  to  very  large,  detailed  models  of  economic 
and  military  systems.  Kantorovitch's  1939  proposals,  which  were  before 
the  advent  of  the  electronic  computer,  mentioned  such  possibilities,  [78]. 
Linear  programming  evolved  out  of  the  U.S.  Air  Force  interest  in  1947  in 
finding  optimal  time-staged  deployment  plans  in  case  of  war,  [126];  a 
problem  whose  mathematical  structure  is  similar  to  that  of  finding  an 
optimal  growth  pattern  of  a  developing  economy  and  similar  to  other 
control  problems,  [41],  [58],  [123].  Structurally  the  dynamic  problems 
are  characterized  in  discrete  form  by  staircase  matrices  representing 
the  inputs  and  outputs  from  one  time  period  to  the  next.  Treated  as  an 
ordinary  linear  program,  the  number  of  rows  and  columns  grows  in  proportion 
to  the  number  of  time  periods  T  and  the  computational  effort  grows  by 
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T  and  possibly  higher.  This  fact  has  limited  the  use  of  linear 
programming  as  a  tool  for  planning  over  many  time  periods. 


*Departments  of  Operations  Research  and  Computer  Science,  Stanford 
University,  Stanford,  California.  Research  of  G.B.  Dantzig  partially 
supported  by  Office  of  Naval  Research,  Contract  ONR-N-00014-67-A-0112-0011; 
U.S,  Atomic  Energy  Commission,  Contract  AT(04-3)-326  PA  #18;  National 
Science  Foundation  Grant  GP  6431;  and  U.S.  Army  Research  Office,  Contract 
No.  DAHC04-67-C-0028. 

Reproduction  in  whole  or  in  part  for  any  purpose  of  the  United  States 
Government  is  permitted. 


At  the  present  1967  stage  of  the  computer  revolution, 
there  is  growing  interest  on  the  part  of  practical  users  of  linear 
programming  models  to  solve  larger  and  larger  systems  [40].  Such 
applications  imply  that  eventually  automated  systems  will  obtain 
information  from  counters  and  sensing  devices,  process  data  into 
the  proper  form  for  optimization  and  finally  implement  the  results  by 
control  devices.  There  has  been  steady  progress  in  this  mechanization 
of  flow  to  and  from  the  computer.  Hitherto,  this  has  been  one  of  the 
obstacles  encountered  in  setting-up  and  solving  large-scale  systems, 
[113].  The  second  obstacle  has  been  the  cost  and  the  time  required  to 
successfully  solve  large  problems,  [74]. 

It  is  difficult  to  measure  the  potential  of  large-scale 
planning.  Certain  developing  countries  appear,  according  to  optimal 
calculations  on  simplified  models  to  be  able  to  grow  at  the  rate  of 
15%  per  year  implying  a  doubling  of  their  industrial  base  in  5  years. 
But  administrators  apparently  ignore  plans  and  make  decisions  based  on 
political  expediency  which  restrict  growth  to  2  or  3%  and  sometimes  -2%. 
It  is  the  belief  of  the  author  that  the  mechanization  of  data  flow  (at 
least  in  advanced  countries)  in  the  next  decade  will  provide  pathways 
for  constructing  large  models  and  the  effective  implementation  of  the 
results  of  optimization.  This  application  of  mathematics  to  decision 
processes  will  eventually  become  as  important  as  the  classical 
applications  to  physics  and  will,  in  time,  change  the  emphasis  in  pure 
mathematics. 
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Three  Approaches  to  Solving  Large-Scale  Systems: 


There  have  been  a  great  number  of  papers  on  this  subject  as  evidenced 
from  the  list  of  references  attached.  I  have  broadly  classified  them 
into: 


I  Decomposition  Principle 

(Sub-optimization  using  interior  path) 

II  Compact  Inverse 

(Using  a  simplex  variant) 

III  Parametric  Variation 

(Sub-optimization  using  simplex  variant) 

The  aim  is  to  say  a  little  about  each,  citing  some  references  and  some 
structures  to  which  they  are  applicable.  We  shall  begin  with 

I;  The  Decomposition  Principle.  [47] : 

Consider  the  non-linear  programming  problem:  Find 
x  *  (x^,...,xn)  such  that 

(1)  g (x)  =  Min 

fx(x)  <  0  :\1 

f2(x)  1  0  :X2 

f^(x)  ^  0 

OxT T ’  o 

m 

We  assume  g(x)  and  f^x)  are  convex  functions  of  x  .  Assigning 
Lagrange  Multipliers  X^  to  a  subset  of  the  constraints,  say  the  first 
two,  we  obtain  the  SUBPROBLEM:  Find  x  and  Min  0(x)  satisfying 

(2)  0(x)  -  g(x)  +  X^U)  +  X2f2(x) 

f.(x)  <,  0,...,f  (x)  ±  0 
j  —  m 
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■J 


j 


1 


Theorem:  If  for  given  A^  ,  x  *  R  solves  the  subproblem  (2)  and  If 
f±W  <.  0  for  all  i  and  A^U)  =  0  for.  i  =  (1,2)  then  x  -  St 
solves  (1) . 


We  shall  discuss  a  method  where  we  assign  values  to  A^ 
and  if  the  resulting  x  =  &  does  not  satisfy  the  conditions  in  the  theorem. 


this  fact  can  be  used  to  improve  the  values  of  A 
(la)  Example: 


i  * 


FIND  x  >  0  ,  Min  f  (x) : 
—  o 


(3)  c^  +  c2  x2  +  c^  x3  +  c^  x^  +  c^  x5 

allXl  +  a12x2  +  a13X3  +  a14X4  +  a15X5 

a21xi  +  a22X2 

a31Xl  +  a32x2 


a43X3  +  a44X4 
a53X3  +  a54X4 


"  fo(x) 
-  b. 


—  b2 
~  b3 


—  b4 

—  b5 


GUESSl 


>  SUBPROBLEM 


a65X5  b6 


01(x1,x2)  +  02(x3,x^)  +  03(x5)  =  0(x)  Min  J 


where  03  =  (^  +  ^a11)x1  +  (c2  +  Aa12)x2;  02  *  (c3  +  Aa13)x3  +  (c^  +  Aa^x^; 

03  -  (c5  +  Xa15)x5 


Note  that  the  subproblem  decomposes  into  three  separate  problems;  hence 
the  term:  "Decomposition  Principle". 

(Ib)  Equivalent  Generalized  Linear  Program: 


Returning  to  problem  (1)  we  now  restate  it  in  the  form  of 
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Wolfe's  Generalized  Linear  Program,  [38,  Chapter  22],  This  differs  from 


an  ordinary  linear  program  in  that  the  coefficients  in  each  column  P 

J 

instead  of  being  fixed  are  freely  drawn  from  a  convex  set  C  .  It  can 

T 

be  shown  that  the  following  problem  is  equivalent  to  (1) .  \ 


FIND  Min 


z  ,  w±  _>  0  such  that 


0  1-1 


gCx1) 


g(xfc) 


fl(x) 


0  0  f,  (x  )  f  (x  )  f,  (x) 

(4)  _>  z+  ,  w,+. ..+  w  + 

0  0  f 2 (x  )  f 2 (x  )  f2^x) 


RESTRICTED  MASTER 


g(x) 


where  x  and  x  satisfy  f0(x)  <  0,...,f  (x)  <  0  and  the  solution 

j  —  m  — 


to  (1)  is 


V  i 
X  -  I  w±x 


+  wx. 


:ic)  Iterative  Process: 


At  the  start  of  iteration  t  ,  x^',...,xt  are  known.  An 

improved  guess  of  ( A ^  ,  ^2^  an<*  a  new  x  =  xt+^  is  obtained  by  solving 
the  "restricted  master"  linear  programming  indicated  in  (4).  Let  the 
optimal  dual  variables  be  (1,  A  J ,  yt)  and  let  w^  =  w^  be  the 

optimal  primal  variables.  Then 


(6)  Rtc  l  wj  x1 

i-1 

is  an  optimal  solution  to  (1)  if 


(7)  Min  [g (x)  +  A*  f^x)  +  A*  f2(x)  +  u)  ±  0  , 


f1(x)  ^  0  , 


f2(x)  <.  0  ; 
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i.e.  if  the  last  column  "prices  out"  non-negative  for  all  admissable  x  . 

But  (7)  is  the  same  as  solving  the  subproblem  (2)  using  (X^,  X^  ■  (X^,  X*)  . 
If  in  (7),  x  -  xt+1  yields  a  Min  <  0  ,  this  x  is  used  to  generate  a 
new  column  of  (4). 

The  successive  satisfy  f^(x)  0  for  all  i  and 

gO**")  -*  Min  g(x)  .  The  iterative  process  is  finite  when  applied  to  a 
linear  program  like  the  preceeding  example  (3). 

This  completes  our  discussion  of  the  decomposition  approach. 

To  be  useful,  the  generated  subproblems  must  be  easy  to  solve,  [38, Chapter  24]. 

(II)  Compact  Inverse: 

The  second  approach  accepts  the  standard  simplex  or  any  of  the 
numerous  variants  and  tries  to  arrange  the  arithmetic  to  take  advantage 
of  structure.  It  is  clear  that  if  the  number  of  iterations  is  fixed,  the 
only  savings  can  come  from  doing  each  iteration  efficiently:  i.e.  doing 
the  pricing  and  those  operations  involving  the  inverse  of  the  basis 
efficiently. 

(Ila)  Sparse  Matrices: 

The  larger  problems  become  the  lower,  in  practice  become  the 
density  of  non-zero  coefficients.  For  problems  of  200  equations  a  density 
of  5%  is  typical;  for  larger  problems  the  density  drops  to  .5%  or  less. 

It  is  possible, however,  that  the  inverses  of  bases  drawn  from  such 
matrices  to  be  100%  dense,  for  example: 
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However,  if  the  inverse  is  expressed  as  products  of  elementary  matrices  of 
either  the  row  or  column  type  or  both  in  any  order,  the  number  of  off- 
diagonal  non-zeros  in  this  representation  can  often  be  quite  low. 


Unsolved  prohlem;  Given  a  basis  express  the  inverse  as  the  product  of 
elementary  matrices  such  that  the  count  of  off-diagonal  non-zero  elements 
is  minimal. 


Markowitz  [  90]  proposed  that  the  elementary  matrices 
correspond  to  upper  and  lower  dlagonalization  operations  using  as  pivot 
element  the  one  that  locally  creates  as  few  additional  non-zeros  as 
possible.  Variations  of  this  idea  have  been  incorporated  in  commercial 
codes  in  the  early  1960’s,  see  [43].  The  inverse  of  a  5%  dense  basis 
often  running  not  more  than  7%  dense  and  the  running  time  often  is  cut 
by  a  factor  of  5.  In  example  (8),  the  inverse  in  product  form  has  the 
same  number  of  non-zeros  as  the  originating  basis. 

(lib)  Dynamic  Structures; 

As  noted  earlier  these  have  important  applications  [95]. 
One  such  is  to  linear  control  processes,  see  [114],  [128].  As  early  as 
1954,  the  author  published  a  paper  on  how  to  compact  the  inverse 
representation  of  the  basis  with  a  staircase  structure,  (9) ;  see  [32]. 
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Again,  in  1963,  I  discussed  another  method  which  also  permitted  one  to 
find  a  compact  inverse  and  efficiently  maintain  the  compactness  in  moving 
from  one  iteration  to  the  next,  [37].  There  have  been  other  proposals,  all 
excellent,  that  seek  to  apply  the  simplex  method  to  the  full  system  by 
compacting  the  inverse.  As  far  as  I  know,  none  of  these  direct  proposals 
have  been  realized  in  computer  codes.  See  [5],  [56],  [71]. 


(9) 


1st  period 
input 


~A1 

|lst  period 
output 


2nd  period 
input 


"A2 

2nd  period 
output 


3rd  period 
input 


"A3 

3rd  period 
output 


4th  period 
input 


_A4 

|4th  period 
output 


An  important  special  case  is  the  Dynamic  Leontief  Economic 
Model  with  Substitution  [33].  Another  Special  Case  is  a  Markov  Process 
with  Alternative  Policies  [125],  [76],  These  cases  are  known  to  be 
mathematically  equivalent  and  to  have  a  remarkably  simple  solution.  A 
Leontief  System  is  defined  by:  (1)  a  non-negative  right  hand  side. 
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(2)  exactly  one  positive  coefficient  in  each  column,  and  (3)  the 
existence  of  a  feasible  solution  for  some  positive  right  hand  side. 

In  the  dynamic  case,  we  further  assume  that  the  positive  coefficient 
always  appears  in  the  input  block  along  the  diagonal. 

Theorem:  The  optimal  choice  of  basic  columns  associated  with  the  last 
period  is  independent  of  the  choice  in  prior  periods. 

This  permits  the  determining  of  optimal  basis  and  Lagrange  multipliers 
for  the  last  block  of  equations.  Weighting  these  equations  by  their 
multipliers,  the  last  period  equations  are  subtracted  from  the  cost 
equations  to  produce  a  modified-cost  equation.  Dropping  these  equations, 
the  optimal  choice  of  columns  for  the  next-to-last  period  and  prices  are 
next  determined  using  the  modified-cost  equation;  etc.  backwards  in  time 
until  the  first  period  is  reached.  When  the  basic  columns  of  the  first 
period  become  known,  the  value  of  its  basic  variables  can  be  calculated, 
these  in  turn  can  be  used  to  determine  those  of  the  second  period,  etc. 
forward  in  time.  [118]. 

The  essential  characteristic  of  the  basis  in  the  dynamic 
Leontief  case  and  in  the  Markov  Process  case  is  that  the  blocks  of  non¬ 
zero  coefficients  are  square  and  non-singular  and  the  entire  basis  is 
block  triangular.  Hence  only  the  inverses  of  blocks  along  the  diagonal 
are  needed;  the  rest  of  the  calculations  can  be  done  by  substitution 
below  the  diagonal.  An  ideal  block-triangular  structure!  Unfortunately, 
the  general  staircase  problem  does  not  have  this  property.  It  would  be 
very  worthwhile  to  see  if  one  can  find  a  meaningful  economic  extension  of 
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Che  Leontief  model  (like  Che  inCroducClon  of  activities  ChaC  generate 
capical)  chac  is  craccable. 


(lie)  Block  Andular  SvsCems;  These  conalsc  of  M  general  linear 
equaCions  and  L  sees  of  equacions  which  have  no  variable  in  common. 
The  blocks  of  non-zero  coefficienCs  are  deplcCed  below. 


(10) 


Several  proposals  have  been  made  to  compact  the  inverse,  see 
particularly  Bennett  [21];  also  [79],  [106].  Essentially  they  all 
chose  square  non-singular  submatrices  B^  from  the  basis  along  the 
block  diagonal  which  are  used  as  block  pivots  to  initiate  the 
elimination.  After  the  elimination,  a  square  m  x  m  submatrix  is 
left.  Many  practical  problems  satisfy  this  structure.  One  important 
subclass  are  the  multi-commodity  network  problems.  [54],  [77], 
sometimes  referred  to  as  the  traffic  assignment  problem  [24]: 


(ID 


Find 


Min  z  : 


l 

k 


(l»j»k)  “  l,...,n. 
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[  xlpk  '  I  xpjk  ■  ‘pk 

(p»k)  -  1, . . . ,n 

111  X441,  "  2 

J  ]  J  ijk  ijk 

In  another  type  of  application  involving  the  allocation  of  many  orders 
to  several  plants,  the  diagonal  blocks  consist  of  one  equation  each. 
Such  a  system  is  referred  to  as  a  generalized  upper-bound  structure. 
[46].  In  one  application  L  ■  4000  and  M  ■  20  .  An  important 
property  of  such  systems  is  that  when  L  is  large  relative  to  M  most 
(in  fact  L-M  or  more)  of  the  diagonal  equations  have  exactly  one 
basic  variable  among  the  set  of  its  variables.  The  fact  that  most 
basic  variables  are  at  their  upper-bound  value  can  be  used  to  advantage. 
The  first  code  along  these  lines  was  developed  by  M.  Kasatkin  and 
J.  Bigelow  for  a  problem  of  Crown  Zellerbach  paper  corporation. 

Running  time  on  an  example  was  about  1/10  the  time  thac  was  required 
by  a  general  purpose  code.  See  also  [65]. 

Bordered  Angular  Systems:  This  consists  of  blocks  along  the 
diagonal  of  non-zero  coefficients  and  a  border  of  non-zeros  along  the 
top  and  left. 

(12) 

This  structure  is  sufficiently  general  yet  specialized  to  usefully  cover 


lid 
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a  majority  of  current  applications  except  the  staircase  type. 
Generalization  (of  the  procedures  just  discussed)  have  been  made  by 
heestennan,  [72].  Ritter  [99]  has  generalized  Rosen's  parametric 
scheme,  [103]. 

III.  Parametric  Variation: 

The  third  and  last  approach  depends  on  the  system  being 
weakly  linked  i.e.  on  the  existence  of  a  few  rows  and  columns  which, 
if  removed,  makes  the  solution  of  the  remaining  system  trivial.  For 
example,  a  network-flow  problem  with  an  extra  budget  constraint.  By 
assigning  a  Lagrange  Multiplier  to  the  latter,  the  constraint  could 
be  removed  and  the  objective  equation  modified  by  adding  to  it  the 
multiple  of  the  removed  constraint.  The  resulting  pure  network  could 
then  be  easily  solved.  If  the  solution  does  rot  satisfy  the 
constraint  and  complementary  slackness  conditions,  then  the  Lagrange 
Multiplier  is  varied  until  it  does.  This  is  also  the  idea  behind  the 
decomposition  principle  but  the  proposed  methods  of  variation  (such  as 
those  below)  are  more  direct: 

Rosen:  "Partition  Programming"  [103],  Ritter  [99]. 

Kron:  "Diakoptics"  [83]. 

Balas:  "Infeasibility  Pricing"  [10]. 

Beale:  "Pseudo  Basic  Variables"  [17]. 

Abadie  &  Williams:  [3]. 

Gass:  "Dualplex  Method"  [59]. 

(Ilia)  Dualplex  Method: 

As  representative  of  the  parametric  approaches  I  have  selected 
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Gass'  "Dualplex  Method"  which  is  related  to  Rosen's  "Partition 


Pivoting  allowed  here  Not  here 

X  >  0  Y  >  0 


that  pivoting  in  the  right  hand  interconnecting  part  would  destroy  the 
angular  structure  but  pivoting  anywhere  in  E^,  E^,  E^  would  not. 

We  assume  that  for  a  given  Y  ■  Y°  ^  0  (variables  associated  with  D  ) 
a  feasible  solution  X  *  X°  >_  0  exists  and  is  optimal.  Let  the  system 
be  reduced  to  optimal  canonical  form  restricting  pivots  to  only  columns 
of  E^^  : 

(14)  IXfi  +  AXj^  +  DY  -  b 

cXX7  +  dY  -  z  -  Z  (Max) 

N  0 

where  X^  are  basic  variables  and  X^,Y  non-basic.  Holding  X^  ■  0 
for  the  moment,  we  solve  the  subproblem 

(15)  DY  <.  b  ,  Y  >.  0  ,  Max  dY  . 


i 
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The  dual  of  this  subproblem  Is 


(16) 


JID  I  d  ,  n  >  0  ,  Min  nb  . 


-T 

Since  D  is  presumed  to  consist  of  few  rows  and  many  columns,  it  is 
suitable  for  solution  by  the  standard  simplex  method.  Let  n “11^  be 
an  optimal  solution  and  Y  -  Y*  _>  0  be  optimal  to  its  dual.  Denote 
by  the  i-th  row  of  D  and  by  Aj  the  j-th  column  of  A  .  Let 

the  basic  Xfi  be  partitioned  into  Xj  -  0  and  X^  >  0  according  as 
components  xi  *  0  or  xi  >  0  where  xi  +  DjY'  -  b^  ;  Let  the 
non-basic  Xj^  be  partitioned  into  X^  and  XJV  according  as 

5j  ■  h  '  nJ)  >  0  or  i  0  • 


sm  »  0 

«iVi° 

Xj  -  0 

XII > 0 

xm  ’  0 

xiv-° 

Y’  >.  0 

(17) 


1 

Block 

• 

1  . 

Pivot 

• 

• 

’1 

• 

-  -  -  -  - 

• 

• 

• 

• 

A 

1 

• 

• 

• 

c  . 

z-z  (Max) 
o 


The  block  pivot: 

The  next  step  is  to  find  the  block  pivot  of  highest  rank  that 
switches  the  role  of  as  many  basic  and  nonbasic  variables  in  X^  and 
^III  as  Poss*-ble«  Since  both  sets  are  at  zero  value  this  does  not 
effect  the  current  feasible  solution.  If  there  is  a  choice  of  block 
pivot  its  columns  are  selected  from  those  with  highest  6^  values. 
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After  the  pivot  the  new  dual  subproblem  la  solved  using  as  starting 
basis,  the  one  corresponding  to  the  final  basis  of  the  previous 
subproblem.  Y'  ,>  0  is  still  a  feasible  price  vector  of  the  dual 
subproblem  but  11*  no  longer  satisfies  it.  However, 

Theorem  (Gass): 

If  after  the  block  pivot  those  components 
corresponding  to  6^  >  0  are  replaced  by  the  value  -6^  ,  the  new 
n  constitutes  an  infeasible  basic  solution  to  the  new  subproblem; 

Y'  _>  0  remains  as  a  feasible  vector  of  dual  simplex  multipliers. 

Because  of  infeasibility,  the  new  subproblem  can  be  improved 
(using  the  dual  simplex  method).  This  is  repeated  iteratively  until  all 
6^  _<  0  or  z  -*■  +  «  .  Associated  with  each  iteration  is  a  basic  feasible 
to  the  full  problem  so  that  usual  proof  of  a  finite-number-of-iterations 
applies. 

The  parametric  methods  should  be  regarded  as  important 
variants  of  the  standard  simplex  process. 

Concluding  Remarks: 

This  completes  the  survey  of  the  three  types  of  approaches 
to  solving  large-scale  systems:  Decomposition,  Compact  Inverse,  and 
Parametric  Variation,  and  of  the  type  of  matrix  structures  that  each 
are  best  suited.  Little  has  been  said  about  how  different  proposals 
compare  on  test  problems.  At  present,  there  does  not  appear  to  be  a 
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practical  way  to  do  this.  The  program  of  instructions  for  the  computer 
are  often  an  order  of  magnitude  more  complex  than  a  good  commercial 
linear  program  system  and  the  latter  can  cost  two  to  five  hundred 
thousand  dollars  to  develop.  The  author  feels  that  better  computer 
languages  have  to  be  developed  to  facilitate  the  experimental  coding 
and  comparision  of  large-scale  system  proposals,  [74]. 
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